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ABSTRACT 


Extreme Learning Machine (ELM) is widely known as an effective learning 
algorithm than the conventional learning methods from the point of learning 
speed as well as generalization. In traditional fuzzy inference method which 
was the "1f-then" rules, all the input and output objects were assigned to 
antecedent and consequent component respectively. However, a major 
dilemma was that the fuzzy rules' number kept increasing until the system 
and arrangement of the rules became complicated. Therefore, the single input 
rule modules connected type fuzzy inference (SIRM) method where 
consociated the output of the fuzzy rules modules significantly. In this paper, 
we put forward a novel single input rule modules based on extreme learning 
machine (denoted as SIRM-ELM) for solving data regression problems. In 
this hybrid model, the concept of SIRM is applied as hidden neurons of ELM 
and each of them represents a single input fuzzy rules. Hence, the number of 
fuzzy rule and the number of hidden neuron of ELM are the same. The 
effectiveness of proposed SIRM-ELM model is verified using sigmoid 
activation functions based on several benchmark datasets and a NOx 
emission of power generation plant. Experimental results illustrate that our 
proposed SIRM-ELM model is capable of achieving small root mean square 
error, 1.e., 0.027448 for prediction of NOx emission. 
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1. INTRODUCTION 


Lately, Extreme Learning Machine (ELM) has been acknowledged as an effective learning 
algorithm than the conventional learning methods from the perspective of generalization and learning speed 
[1-8]. The inspiration of the Extreme Learning Machine (ELM) suggested by Huang et al. comes from 
biological learning. It is applicable for solving problems pertaining to back-propagation (BP) learning 
algorithms. It is therefore conjectured that certain parts of the brain signals are made up of random neurons 
that are independent of their environment [1]. This process is known as ELM or so called Single Layer 
Feedforward Network (SLEN). Its corresponding general architecture was illustrated in Figure 1. ELM has 
the capability to make universal approximation with haphazard biases and input weights [9]. 
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Hidden Layer 





Figure 1. Architecture of ELM 


In traditional fuzzy inference method which was the "if-then" rules, all the input and output objects 
were assigned to antecedent and consequent component respectively. However, a major dilemma was that the 
fuzzy rules' number kept increasing until the system and arrangement of the rules became complicated [10]. 
Therefore, the single input rule modules connected type fuzzy inference (SIRM) method where consociated 
the output of the fuzzy rules modules significantly [11-16].The SIRM method had been applied to control of 
first as well as second order lag system with dead time [11-12], nonlinear function identification [10], anti- 
swing control and positioning of overhead traveling crane [13], stabilization control of inverted pendulum 
systems [14-16], as well as others, of which decent results were acquired [17-22]. 

Assume that a system consists of n input source and one output source. However, the system can 
also be extended with plural output sources. This is the basic, with n input source for SIRM: 


Mi 
SRM is} RL x= Al then suc (1) 
j=l 


In (1), each SIRM independently corresponded to n input sources. The SIRM-i where the i refers to i 
th input source, R? is the j th rule in the SIRM-i, x; refers to the i th input source variable in the preceding 


part, and A u; is the variable in the following part of the SIRM-i. A? and C i are the membership functions of 
the x; whereas A u; is the j th rule in the SIRM-i. Additionally, i = 1, 2,..., n is the index number of the 
SIRM whereby j = 1, 2,...,m,, is the index number of the rules in the SIRM-i. 

This paper proposes an ELM-based model by using ELM hybrid with SIRM (here after denoted as 
SIRM-ELM). In the SIRM-ELM, there is only a single input that connected to the rules where the rules are 
the hidden neurons of ELM and each of them represents a single input fuzzy rules. Hence, the number of 
fuzzy rule and the number of hidden neuron of ELM are equivalent. 

The paper is ordered as below. In Section II, the learning algorithms of SIRM-ELM are explained. 
After that, Section II presents the results of benchmark regression datasets (e.g. Abalone, Balloon, Strike and 
Space-ga) to test the proposed model's performance. The application of the proposed model is tested and 
presented in Section IV which is using the NOx emission in a power generation plant. Lastly, Section V 
presents a recapitulation of important findings with suggestion for further work. 


2. THE ALGORITHMS OF SIRM-ELM 
The structure of SIRM-ELM is illustrated in Figure 2. The stepwise training protocols are listed as 
below. Refer to Figure 2 for the details definition of variables and parameters. 


Step 1: Haphazardly set the input weights aj , as well as bias, bj (for i=1, 2..., N where as forj = 1, 
2, 3) of hidden neurons. Take into account that a/ and b/ are parameters of membership function for SIRM, 


Af . The weights are generated based on aD-«@, where D is uniform distribution function that randomly 
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generate a number between 0 to 1, a and œ are the parameters. By default, a = 2, œ = 1. As the result, the aj 


and b/ are in the range of -1 to +1. 





Input Hidden Neurons, Output 
H 





r = Input feature 


A’ =" Membership Function of x 


N = Index of input features 
f= 3 (Number of Membership Function ) 
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Figure 2. Overview of SIRM-ELM. 
(a) General of SIRM-ELM model; (b) General details for each hidden neuron. 


Step 2: For the training pair (x,;, tp) where Xy is i feature of p™ training pair and t, is target output 


(for p = 1, 2, ..., P). Calculate the hidden layer output matrix H based on membership function u(x pi, 47). For 


simplicity, the membership function can be denoted as i, 


(x pj.) bf ) aE eer se (2) 
1+ exp{-(a; X pi +b: )} 
1 2 3 1 2 1 2 3 
Hi Ai Ai A2 A2 © AN AN AN 3 
1 2 3 1 2 1 2 3 
H=| 42 #2 “hor. Hop Kn = Pow Aon MN ( ) 
i oO. 3% A Gi og 2 3 
Hpi Fp, Fpi Mp2 Fp. > KPN PPN FPN |px3N 


Step 3: The output weights, B ,were computed. Since it is high possibility that H is a non-symmetry 


matrix, the inverse matrix cannot be resolved. To circumvent this problem, a moore-penrose pseudo inverse 
matrix method is utilized, hence work out the output weights of B by (4), 


B=(H B) 'HIT (4) 
where T is target output matrix, i.e., T=[4 t .. tyf 

Step 4: After the output weights of SIRM-ELM were calculated, prediction of a set of new and 
unlabeled samples z can be computed, i.e., A(.) is the membership function, h is the hidden layer whereby y 


is the prediction output. 


A(Zqi a} bD = —— (5) 
1+ exp{—(a? zgi + bj yt 
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where q = 1, 2, ....O and Q is number of test samples. 
Step 5: After compute the output of ELM for testing samples, calculate the root mean squared error 
(RMSE), i.e., 


RMSE, (8) 


est — 





where y, and d, were prediction and actual output respective to z,. Flowcharts were delineated in Figure 3 
and Figure 4 to simplify the procedures taken by stepwise training protocols. 


. Load training datasets. 
. Randomly assign input weights 1. Load testing datasets. 
(aj ) and bias (b; ) of hidden 2. Load all parameters of SIRM- 


neurons. ELM. 





Figure 3. Flowchart that represents Figure 4. Flowchart that represents 
the step 1 to step 3. the step 4 


3. RESULTS AND DISCUSSION 

The applicability of the SIRM-ELM model was investigated in this section. Four benchmark 
regression datasets from the UCI machine repository (e.g. Abalone, Balloon, Strike and Space-ga) were 
utilized for performance evaluation of SIRM-ELM. Only Addictive Sigmoid hidden neuron (SigAct) was 
utilized in the analysis. All analysis were run on a personal computer equipped with Intel(R) Core(TM) 17 2.9 
GHz CPU and 8 G RAM using MATLAB (ver.2010), as detailed in Table 1. Table 2 listed the datasets 
specifications used in the experiments. 
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Table 1. Specification of personal computer and software packages utilized 
for experiments andcomparison. 


Items Specification 
Personal Computer Asus 
Operating Systems Windows 8.1 
CPU Intel(R) Core(TM) 17 2.5GHz 
RAM 8 GB 
Software Matlab 7.11.0.584 (R2010b) 
Programming Language Matlab Language 


Table 2. Specification of benchmark regression datasets. 


Datasets # Attributes # Training Samples # Testing Samples # Total Samples 
Abalone 8 3000 1177 4177 
Balloon 2 1334 667 2001 
Strike 6 416 209 625 
Space-ga 6 2071 1036 3107 


In all experiments, four benchmark regression datasets with training and validation samples were 
evaluated using the train-validation-test method as suggested by literature [1]. The number of membership 
function of an input attribute is tested for 1, 2 or 3, (1.e., j = 1, 2, 3) for all the regression datasets. In addition, 


the RMSE is based on default range for al and b/ for all rules (1.e., = 1, 2...,3N). Note that in SIRM-ELM, 
the number of fuzzy rule was equivalent to number of hidden neuron of ELM. For each dataset, the 
experiments were conducted for 50 times with random a/ and bý and the average results are recorded. 


The results of proposed SIRM-ELM were also compared to results of other ELM-based methods. As 
seen from Table 3, the RMSE of SIRM-ELM are better when compare with OS-ELM [21], SVM [21] and 
ELM [1]. Note that SIRM-ELM perform better than OS-ELM for Abalone dataset as it has only one 
parameters as compared to OS-ELM that has three parameters. 


Table 3. RMSE of SIRM-ELM, ELM [1], SVM [21 ] and OS-ELM [21] 
Abalone Balloon Strike Space-ga 
RMSE RMSE RMSE RMSE 
SIRM-ELM 0.07598 0.04432 0.2656 0.03591 
OS-ELM [21] 0.0771 - - - 
SVM [21] 0.0764 0.059 0.2282 0.0648 
ELM [1] 0.0761 0.0553 0.2985 0.0624 


Algorithm 


4. NOx EMISSION OF POWER GENERATION PLANT 

Nitrogen occurred naturally in the atmosphere as an inactive gas. In addition, our atmosphere 
contains just about 78% N2 by volume in the air. The NOx was referring to nitrogen oxides but mostly 
include nitrogen monoxide, also identified as nitric oxide, NO as well as nitrogen dioxide, NO2. There were 
also others in the family like laughing gas (known as nitrous oxide, N20), nitrogen pentoxide (N2Os) and 
nitrogen tetroxide (N2034). 

The presence of NO, in the atmosphere posed direct and indirect effects on human health and 
ecosystems, i.e. animals and plants, in the environment. NO, reacted with components such as water, oxygen 
and other chemicals to form smog and acidic pollutants which leads to the formation of acid rain. In turn, 
acid rain, together with dry deposition and cloud, may cause damages and deterioration to cars and buildings. 

NO, is mainly released during combustion process of fossil fuels like coal, oil and natural gas. 
According to European Environment Agency (EEA) technical report (1990 - 2013), 21% of the NOx gas 
emissions in European Union were from the energy production and distribution, which was approximately 
1,600 kilotonne [23, 24]. However, the growth of power generation industries was expected to be increasing 
by 18.7 gigawatts (GW) in the coming years, 2016 - 2018, due to price and availability of natural gas. Hence, 
prediction of NO, emission is vital to the power generation sector and it shall not be taken lightly. 

In case of application, the NO, emission of an open cycle gas turbine in a power generation plant 
(located in Port Dickson, Malaysia) has been investigated [25]. The objective was to develop a neural 
network model for prediction of NOx emission. There are 150 input attributes taken from the parameters of 
the power generation plant such as the loading of the gas turbine, temperature, pressure and etc. The targeted 
output is the quantity of NO, (in ppm) emission from the gas turbine. 
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A total of 3,405 data samples have been collected for training and testing of SIRM-ELM. Out of 
3,405 data samples, 2,270 were used for training while the remaining 1,135 were used for testing. The number of 


membership function of an input attribute was tested for 1, 2 or 3, (1.e., 7 = 1, 2, 3) and the results are shown 
in Table 4. 


Based on the results on the Table 4, the ad and bj ware in default setting (in Step 1). After set the 
number of membership function of an input attribute as 1, in order to get the lowest root mean squared error 


(RMSE), the af andb/ need to be tuned in different ranges. The complete tuning results are recorded in 
Table 5. The best RMSE in Table 5 is 0.028647. 


Table 4. Results for NO, Emission of SIRM-ELM using differences of number of 


membership function. 
# Number of membership function of 


an input attribute ran 
1 0.030358 
2 0.056454 
3 0.805105 


Table 5. Results for NOx Emission of SIRM-ELM using different ranges of weights. 


Range 
aj bi RMSE 
-1 to +1 -l to +1 0.030358 
-2 to 2 -l to +1 0.032502 
0 to +l -l to +1 0.031703 
-l to 0 -l to +1 0.031173 
-1 to +1 0 to +1 0.033823 
-1 to +1 -l to 0 0.033294 
-1 to +1 0.5 to +1 0.031032 
-l to +1 0.5 0.028647 


In the experiment of using ELM, 2/3 of the data samples were utilized for training while the 
remaining 1/3 were utilized to verify the most suitable number of neurons of the parent ELM (i.e., L) through 
a validation process. For sigmoid activation function of ELM, training and validation processes start by 
setting L = 50 units and then increased by an increment of 50 units. As an example, Table 6 shows the testing 
processes based on sigmoid activation function. Based on the results of RMSE in Figure 5, the best RMSE is 
0.027086. Using the result in Figure 5 to compare with Table 5, the RMSE of ELM is lower than RMSE of 
SIRM-ELM due to the complexity of hidden neurons in ELM. 


L vSRMSE 
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Figure 5. RMSE of NO, emission for ELM. 
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4. CONCLUSION 

In essence, this paper presented a framework of Extreme Learning Machine with Single Input Rule 
Module, which was deemed a significant innovation in ELM ideology (here after denoted as SIRM-ELM). 
Adopting Single Input Rule Module in the ELM hidden layer can be a good alternative to the commonly used 
activation function, i.e., Sigmoid (SigAct). SIRM-ELM has been tested with sigmoid activation functions 
utilizing benchmark regression datasets, inclusive of Abalone, Balloon, Strike and Space-ga. The 
experimental results demonstrated that our proposed model was more superior compared to OS-ELM [21], 
SVM [21] and ELM [1], as shown in Table 2. As for real world application, the implementation of SIRM- 
ELM in the prediction of NO, emitted in power generation plant with low RMSE suggested proposed method 
is applicable in power generation. 
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